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Starting from a general ansatz, we show how community detection can be interpreted as finding the 
ground state of an infinite range spin glass. Our approach applies to weighted and directed networks 
alike. It contains the at hoc introduced quality function from 1] and the modularity Q as defined 
by Newman and Girvan as special cases. The community structure of the network is interpreted 
as the spin configuration that minimizes the energy of the spin glass with the spin states being the 
community indices. We elucidate the properties of the ground state configuration to give a concise 
definition of communities as cohesive subgroups in networks that is adaptive to the specific class of 
network under study. Further we show, how hierarchies and overlap in the community structure can 
be detected. Computationally effective local update rules for optimization procedures to find the 
ground state are given. We show how the ansatz may be used to discover the community around a 
given node without detecting all communities in the full network and we give benchmarks for the 
performance of this extension. Finally, we give expectation values for the modularity of random 
graphs, which can be used in the assessment of statistical significance of community structure. 

PACS numbers: 89.75.Hc,89.65.-s,05.50.+q,64.60.Cn 
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INTRODUCTION 



The amount of empirical information that scientists from 
all disciphnes are dealing with is constantly increasing. 
And so is the need for robust, scalable, and easy to use 
clustering techniques for data abstraction, dimensionality 
reduction, or visualization for many scientists performing 
exploratory data analysis [^Q. A basic objective is to 
group objects which are similar together, and dissimilar 
objects apart. But already the question of how to mea- 
sure similarity/dissimilarity is a subject of discussion 
Two main approaches to clustering are identified in the 
literature On one hand there is hierarchical cluster- 
ing where the data set is grouped into a hierarchy of clus- 
ters from single items to the whole data set. Data points 
are either joined successively in an agglomerative manner 
starting from the closest pair of data points or the data 
set is recursively partitioned into two parts, an approach 
which is called divisive. On the other hand, there is par- 
titional clustering, where the data set is directly parti- 
tioned into k different clusters usually optimizing some 
quality function. The number of clusters k is either an in- 
put parameter of the algorithm or found by the clustering 
procedure itself. Transforming the similarity matrix into 
a graph by £.17. thresholding, the clustering problem can 
be tackled from a graph partitioning point of view. These 
approaches apply directly to networks or relational data 
sets where the proximity information is given as a set of 
pairwise relations, i.e. the edges of the network. The 
problem is then approached by a min-cut technique that 
partitions a connected graph into two parts minimizing 
the number of edges to cut 0,IEI3- These approaches, 
however, suffer greatly from being very skewed as the 
min-cut is usually found by cutting off only a very small 
subgraph [ill . A number of penalty functions have been 



suggested to overcome this problem and balance the size 
of subgraphs resulting from a cut. Among these are ratio 
cuts [a normalized cuts |lO| | or min-max cuts • 

Though today the development of these methods lies 
mainly in the realm of computer science, the relations be- 
tween information theory and statistical physics [ill 
have brought about a number of such methods that are 
based on principles from statistical mechanics or analo- 
gies with physical models. When using spin models for 
clustering of multivariate data, the similarity measures 
are translated into coupling strengths and either dynam- 
ical properties such as spin-spin correlations are mea- 
sured or energies are interpreted as quality functions. A 
ferromagnetic Potts model has been applied successfully 
by Blatt et al. Bengtsson has used an anti- 

ferromagnetic Potts model with the number of clusters 
k as input parameter and the assignment of spins in the 
ground state of the system defines the clustering solution. 

In recent years, renewed interest in the graph cluster- 
ing problem from the physics community has come un- 
der the term of "community detection" . As communities, 
one generally understands subsets of nodes that are more 
densely interconnected among each other than with the 
rest of the network. Sparked by the work of Girvan and 
Newman |l5j , a number of other authors have developed 
new algorithms for this problem that take very differ- 
ent approaches. The recent reviews by Newman 16] and 
Danon et al. may serve as introductory reading and 
include methodological overviews and comparative stud- 
ies of the performance of different algorithms, including 
the one presented by the authors in (2- In this article, 
we intend to set the basis for a unified framework under 
which community detection may be viewed and which 
helps in understanding the underlying properties of the 
problem. 
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First, we will show that the problem of community de- 
tection can be mapped onto finding the ground state of 
an infinite ranged Potts spin glass via a simple first prin- 
ciples ansatz by combining the information from both 
present and missing links. The energy of the spin system 
is equivalent to the quality function of the clustering with 
the spins states being the group indices. In the above 
taxonomy of clustering procedures, this corresponds to 
a partitional method with the number of clusters deter- 
mined automatically by the algorithm as the number of 
occupied spin states. A single parameter 7 relates the 
weight given to missing and existing links in the qual- 
ity function and allows for an assessment of overlapping 
and hierarchical community structures. Thereby, we can 
bridge the gap between hierarchical and partitional clus- 
tering and conclude to which extent the cluster structure 
of the network is hierarchical or not. 

In contrast to methods based on dynamical proper- 
ties of the spin system that measure correlations between 
spins, such as the super-paramagnetic (SPC) Potts clus- 
tering introduced by Blatt et. ai, mapping the problem 
to a ground state bears several advantages. First, it is 
computationally less demanding, because we do not have 
to keep track of an x TV correlation matrix of spin 
states. Rather, every spin only carries its most prob- 
able community index. If a probabilistic extension of 
the method is required, an analysis of the overlap of the 
community structures in different local minima of the 
Hamiltonian can be performed as done in Second, 
the properties of the ground state spin configuration lead 
to a direct interpretation of the result in terms of graph 
theoretical measures, which give an exact definition of 
what a "community" is in this framework. The interpre- 
tation of the parameter 7 in the evaluation of hierarchy 
and overlap is much clearer than the interpretation of the 
temperature in SPC. Third, the zero temperature energy 
can be calculated analytically which allows to give expec- 
tation values of the modularity and assess the clustering 
tendency of the graph under study. 

For a natural choice of parameters, we recover the 
"modularity" defined by Girvan and Newman from 
our ansatz as well as the ad hoc introduced quality func- 
tion from Then we will derive a number of graph 
structural properties that define the term "community" 
from the fact that valid community structures correspond 



to minima of the energy landscape of the system. We 
compare this definition to other possibilities from the lit- 
erature. We then show, how hierarchical and overlapping 
community structures can be discovered in this frame- 
work. Even though the quality function resembles an in- 
finite ranged spin glass with couplings between all pairs 
of nodes, we show how efficient minimization routines 
can be implemented that only need to consider interac- 
tions along the links in the network and some global book 
keeping. This makes the use of the method feasible even 
for large systems. Furthermore, we show how a method 
of finding the community around a given node can be de- 
veloped in this general framework and give benchmarks 
for this method. All clustering procedures will find clus- 
ters even when applied to random data. Hence, in the 
last part of the paper, we focus on the statistical signif- 
icance of community detection. We show, how commu- 
nity detection is related to graph partitioning and that 
when community detection is applied to random graphs, 
equally sized communities are found. From the known 
results for the cut size of graph partitionings we can cal- 
culate expectation values for the modularity of random 
graphs which have to be exceeded by any data set that 
is to be called truly modular. 

DERIVATION OF THE HAMILTONIAN 

For the term "community" or "cluster" or "cohesive sub- 
group" a number of different and sometimes conflict- 
ing definitions exist [T^ . All of them have in common 
that communities are understood as groups of densely 
interconnected nodes that are only sparsely connected 
with the rest of the network. Any quality function for 
an assignment of nodes into communities should there- 
fore follow the simple principle: group together what is 
linked, keep apart what is not. From this, we find four 
requirements of such a quality function: it should a.) 
reward internal edges between nodes of the same group 
(in the same spin state) and b.) penalize missing edges 
(non-links) between nodes in the same group. Further, 
it should c.) penalize existing edges between different 
groups (nodes in different spin state) and d.) reward 
non-links between different groups. This leads to the fol- 
lowing function: 



^7^,7 ^^^^^^^^^^^^^^^^^^^^ J ^^^^^^^^^^^^^^s^^^^^^^^^^^^^^^ 

internal links internal non-links 

' V Cy Ay (1 - 5(cr„ CTj)) - V (l " Ay ) ( 1 " (5(cr,, (Tj)) 

r 



% j ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ — ^ 2 

external links external non-links 



in which Aij denotes the adjacency matrix of the graph Ui G {1, 2, g} denotes the spin state (or group index) of 
with Aij — 1, if an edge is present and zero otherwise. 
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node i in the graph and Uij , bij , Cij , dij denote the weights 
of the individual contributions, respectively. The num- 
ber of spin states q determines the maximum number 
of groups allowed and can, in principle, be as large as 
iV, the number of nodes in the network. Note, that not 
all group indices have to be used necessarily in the opti- 
mal assignment of nodes into communities, as some spin 
states may remain unpopulated in the ground state. If 
links and non-links are each weighted equally, regard- 
less whether they are external or internal, i.e. aij — Cij 
and bij = dij., then it is enough to consider the internal 
links and non- links. It remains to find a sensible choice of 
weights aij and bij , preferably such that the contribution 
of links and non-links can be adjusted through a param- 
eter. As we will see, a convenient choice is Uij = 1 — ^Pij 
and bij = jpij, where pij denotes the probability that 
a link exists between node i and j, normalized, such 
that J2i^jPij — "^M. For 7 = 1 this leads to the nat- 
ural situation that the total amount of energy that can 
possibly be contributed by links and non-links is equal: 
J2i^j ^ij^ij = J2i^ji^ ^ For weighted networks 

this approach is generalized in a straight-forward man- 
ner by using a weighted adjacency matrix Wij. In case 
of a directed network with a non-symmetric adjacency 
matrix Aij ^ Aij, one can construct a symmetric repre- 
sentation of network introducing Aij = l/2{Aij -\- Aji) 
and Pij — l/2{pij +pji). In this article, we will only deal 
with undirected, unweighted adjacency matrices. Our 
choice of the weights allows us to further simplify the 
Hamiltonian ||2Jl: 



ni{a}) 



-fp.^-i)S{a^,aj). 



(2) 



This represents a spin glass with couplings Jij = Aij —pij 
between all pairs of nodes: ferromagnetic where links 
between nodes exist and anti-ferromagnetic where links 
are absent. 

Depending on the graph under study, one can assume 
different expressions for pij . The Hamiltonian ^ is ef- 
fectively comparing the true distribution of links in the 
graph under study with the expected distribution given 
by a particular null model which defines pij. With this 
in mind, we can rewrite (jJl in the following two ways: 



and 



w(W) = E 



7[m. 



rslpij ) 



s<r 



(3) 



(4) 



Here, the sum runs over the q spin states and rrirs denotes 
the number of edges between spins in group r and s. 
Consequently, the number of internal edges of group s is 
denoted by niss. The symbol [■]„■ denotes an expectation 



value under the assumption of a link distribution pij, 
given the current assignment of spins. 

In equation Q and Q we have also introduced the 
coefficients of "cohesion" Css and "adhesion" ars to our 
network terminology, which measure the difference be- 
tween realized and expected internal links or realized and 
expected external links, respectively. Note, that both de- 
pend on the choice of the model of connectivity pij and 
the parameter 7. The choice of a particular form oipij al- 
lows for the adaptation of the quality function to the spe- 
cific problem under study and hence allows for the com- 
parison of the quality function for graphs with different 
topology. The only restriction on pij is that the number 
of expected edges between and within groups is an exten- 
sive quantity, i.e. [wislpij + Nzslp.j = Ni+2,3]j9,j for all 
choices of disjoint groups ni,n2 and and [n^asjp^^. = 
[mii]p^. + [m22]p,j + [nT-i2]p.j for aU groups 3 with proper 
subgroups ni and 712 of empty intersection and union n^. 
Using these equalities, we can give a relation for the coef- 
ficient of cohesion of a group of nodes rig and two proper 
subsets Hsi and with empty intersection and union 
Us. It is easy to prove, that 



cii + C22 + ai2. 



(5) 



where Cn and C22 are the coefficients of cohesion of the 
respective subsets rt^i and rts2, and ai2 is the coefficient 
of adhesion between n^i and 71^2 • Equivalently, we can 
write for the adhesion coefficients with n2 of two groups 
Uri and nr2 with union Ur and empty intersection 



ars — O-ls + 0,2s- 



(6) 



Two exemplary choices of link distribution models Pij 
shall illustrate the above. The simplest choice is to as- 
sume every link equally probable with probability pij — p 
which leads naturally to 

[■mss]p = p ""^^"^ — 11 and [mrs]p = prirUs, (7) 

with rir and Us denoting the number of spins in state r 
and s, respectively. This choice of model leads to the 
Hamiltonian originally quoted in Ref. 0: 



i,j€E s 



nsiris - 1) 



(8) 



Here, the first sum runs over all edges and only internal 
edges contribute. Equivalently, we can write (jS)) in terms 
of external edges: 

^(W)= E (i-'^(^-^j))-^^'E'^'-"- (9) 

i.jeE r<s 

where only edges between different groups contribute to 
the first sum. We see that both, © and , compare the 
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actual value of internal or external edges with its respec- 
tive expectation value under the assumption of equally 
probable links and given community sizes. 

A second choice for pij may take into account, that 
the network does exhibit a particular degree distribution. 
Since links are in principle more probable between nodes 
of high degree, links between these nodes get a lower 
weight. We may write: 



2M 



(10) 



which takes this fact and the degree distribution into 
account. Note that it is possible to also include degree- 
degree correlations or any other form of prior knowledge 
about Pij at this point. With these expressions we have: 



1 

T-ss]pij = 27^/ ~2~ i''^rs]pij 



— KrKs. (11) 

Here, Ks is the sum of degrees of nodes in spin state s 
and plays the role of the occupation numbers in equa- 
tion ©. Using these expressions, we can also write the 
Hamiltonian in a form similar to ©: 



i,j£E s 



(12) 



Again, we give an equivalent formulation in terms of ex- 
ternal rather than internal edges similar to 0: 



may be used as a benchmark for different community 
detection algorithms. Alternatively, many authors have 
given values of the quality function Q defined by Newman 
and Girvan as "modularity" j3| as a global, comparative, 
objective measure of how good a community structure 
found by an algorithm is. Alternative formulations fo- 
cussing on the local aspects of community structure also 
exist, such as that of "local modularity" introduced by 
Muff et al. Newman and Girvans's modularity mea- 
sure can be written as 0: 



with as 



(15) 



Here, e^s is the fraction of links that fall between nodes 
in group r and s, i.e. the probability that a randomly 
drawn link connects a node in group r to one in group 
s. The probability that a link has one end in group s 
is expressed by as. From this, we expect a fraction of 
links to connect nodes in group s among themselves. 
Newman's modularity measure hence compares the ac- 
tual link density in a community with an expectation 
value. One can write this modularity in a slightly differ- 
ent way following |l9j |: 



2M ^ 



Ai 



ki kj 
2M 



(5((Tj, aj). 



(16) 



For 7 = 1 and the model pij 
2c 



6{<J,,<j,))~^^Y.^rKs. (13) 

r<s 

kikj/2M, we can derive 



r,r^s 



= 0. 



(14) 



Furthermore, the cohesion is negative {css < 0) if Ug 
consists of only one single node. We see, that there must 
always exist a group of nodes n^, to which this node 
has positive adhesion. Groups of only one node do not 
exist. We stress that relation (I14f) and the conclusions 
just drawn do not hold for 7 7^ 1 or pij = p. 

We see, that even though we are dealing with an in- 
finite range spin glass with couplings between all pairs 
of nodes, one only needs to consider the ferromagnetic 
interactions along the links and the occupation numbers 
or the sum of node degrees of the individual spin states. 
This makes it easy to implement an efficient minimiza- 
tion routine for this Hamiltonian. It should be noted, 
that both the formulations © and ifT^ . ifT^ are 
equivalent in case of a network with fixed connectivity. 



This already resembles Q when pij takes the form 
kikj/2M. It is now clear, that we can write: 



(17) 



with the Hamiltonian ^ and 7=1. Therefore, max- 
imum modularity is reached, when the Hamiltonian ^ 
with Pij = kikj/2M or equivalently H12|) or (|13|l with 
7 = 1 are minimal. To maximize the modularity of a 
community structure is hence equivalent to finding the 
spin configuration that minimizes these Hamiltonians. 
This form of writing the modularity Q i s much simpler 
than the one given by Guimera et al. |20j , which also in- 
volves 3- and 4-spin interactions. We will see below, that 
using this form, we can give efficient update rules that 
allow the direct optimization of the modularity even on 
very large networks. 



PROPERTIES OF THE HAMILTONIAN AND ITS 
GROUND STATE 



EQUIVALENCE WITH NEWMAN- GIRVAN 
MODULARITY 



Comparing the performance in retrieving a known com- 
munity structure from computer generated test networks 



Having mapped the problem of community finding onto 
finding the ground state configuration of a spin glass, 
we can investigate the properties of this minimum en- 
ergy spin configuration. These properties will provide us 
with a definition of what a community is in the frame- 
work of maximizing a quality function. These properties 
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will apply to any local minimum of the Hamiltonian as 
well, such that we can interpret these local minima as 
alternative community structures. Inspection of the to- 
tal energy landscape and comparison of global and lo- 
cal minima and the respective community structure will 
then provide insight into the clustering tendency of the 
network. Obviously, the more local minima with little 
overlap but energies comparable to the global minimum 
there are, the more spin glass like the energy landscape 
is, the less the network shows a truly modular structure. 

Since the Hamiltonians are all additive with respect to 
the different communities, i.e. the numbers of edges and 
the corresponding expectation values are extensive, they 
can be seen as independent entities and we can treat 
a single community independently from the rest of the 
network. The configuration space over which the Hamil- 
tonian is minimized is a discrete space. Once we have 
defined a move set that is ergodic in this discrete space, 
a (local) minimum of the Hamiltonian (with respect to 
this move set) is defined as a configuration for which none 
of the steps from the move set leads to a lower energy. It 
is sufiicient to consider only one move: change a group 
of nodes ni from spin state s to spin state r. The change 
in energy for this move in configuration space is: 



''■l,s\l ^ 0,ir 



(18) 



Here ai s\i is the adhesion of ni with its complement in 
Ug and air is the adhesion of ni with 71^. If we move rii to 
a previously unpopulated spin state, then A7i = ai^s\i- 
This move corresponds to dividing group n^- Further- 
more, if ni = Us, we have A7i = —asr, which corresponds 
to joining groups rig and n^. For a spin configuration to 
be a local minimum of the Hamiltonian, there must not 
exist a move of this type that leads to a lower energy. 
It is clear that some moves may not change the energy 
and are hence called neutral moves. In case of equality 
<^i,s\i = ^i.i- s-nd '^r being a community itself, we say that 
communities and have an overlap of the nodes in 
ni. 

For a community defined as a group of nodes with the 
same spin state in a spin configuration that makes the 
Hamiltonian minimal, we then have the following prop- 
erties: 

1. Every proper subset ni of a community has 
a maximum coefficient of adhesion with its com- 
plement in the community compared to the co- 
efficient of adhesion with any other community 
{ai.s\i = max). 

2. The coefficient of cohesion is non-negative for all 
communities {css > 0). 

3. The coefficient of adhesion between any two com- 
munities is non-positive (a^s < 0). 

The first property is proven by contradiction from the 
fact that we are dealing with a spin configuration that 



makes the Hamiltonian minimal. We also see immedi- 
ately that every proper subset ni of a community Ug 
must have a non-negative adhesion with its complement 
ns\i in the community. In particular this is true for 
every single node I in Ug (a/ > 0). Then we can 
write J2ien, ^Ls\i > 0- Since J^ien, "^'.A' ^ 
S/G«J"^i,s\/]p„ = 2[toss]p,^, this implies Css > for all 
communities s and proves the second property. The third 
property is proven by contradiction again. Again, we 
stress that for 7 = 1 and pij = kikj/2M, no community 
is formed of a single node due to condition (fTHl . The 
last two properties can be summarized in the following 
inequality which provides an intuition about the signifi- 
cance of the parameter 7: 



>0> ar 



Vr, 



(19) 



Assuming a constant link probability, we can rewrite this 
inequality in order to relate the inner link density of a 
community and the outer link density between commu- 
nities with an average link density: 



2m., 



ns{ns - 1) 



> 



IP 



> 



Vr, s. 



(20) 



We see, that 7p can be interpreted as a threshold between 
inner and outer link density under the assumption of a 
constant link probability. The above definition of what 
a community is adapts itself to any network, since the 
specific network model is encoded in the definition of co- 
hesion and adhesion. This makes it possible to compare 
community structures of networks with different topol- 
ogy 



SIMPLE DIVISIVE AND AGGLOMERATIVE 
APPROACHES TO MODULARITY 
MAXIMIZATION 

Hierarchical clustering techniques can be dichotomized 
into divisive and agglomerative approaches 0|. We will 
show, how a simple recursive divisive approach and an 
agglomerative approach may be implemented and where 
they fail. 

In the present framework, a hierarchical divisive algo- 
rithm would mean to construct the ground state of the 
q-state Potts model by recursive partitioning the network 
into two parts according to the ground state of a 2-state 
Potts or Ising system. This procedure would be com- 
putationally simpler and result directly in a hierarchy of 
clusters due to repetition of the procedure on the parts 
until the total energy cannot be lowered anymore. Such 
a procedure would be justified, if the ground state of the 
q-state Potts Hamiltonian and the repeated application 
of the Ising system cut the network along the same edges. 
We will derive a condition under which this can be en- 
sured. 
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FIG. 1: Illustration of the problem of recursive bi- 
partitioning. The ground state of the Hamiltonian with only 
2 possible spin states, as shown in a.), would cut through one 
of the communities that are found when allowing 3 spin states 
as shown in b.). 



In order for this recursive approach to work, we must 
ensure that the ground state of the 2-state Hamiltonian 
never cuts though a community as defined by the q-state 
Hamiltonian. Assume a network made of three commu- 
nities ni, n2 and ria as defined by the ground state of 
the q-state Hamiltonian. For the bi-partitioning, we now 
have two possible scenarios. Without loss of generality, 
the cut is made cither between ri2 and ni + or be- 
tween ni, 712 and = Ua + rih, parting the network into 
ni + na and n2 Since the former situation should be 
energetically lower for the algorithm to work, we arrive 
at the condition that 

rriab - [mab]p,j + "lib - ["^lb]p.j > '7^2fc - ['7^2fc]p.j , (21) 

which must be valid for all subgroups Ua and rifc of com- 
munity 113. Since 713 is a community, we further know, 
that niab- [niabjp,^ > ?"ib- [wiblpij and rriab- [mobjp^^. > 
m2b ~ [m2b]p,j ■ Though ruab - [mab]p,j > 0, since 113 is 
a community, niib - [mib]p^^. < and - [TO2b]p,j < 
for the same reason and hence condition 1)21(1 is not gen- 
erally satisfied. Figure ^ illustrates a counter example. 
Assuming pij — p, the link probability in the network. 
The upper part a.) of the figure shows the ground state 
of the system when using only two spin states. Part b.) 
of the figure shows the ground state of the system with- 
out constraints on the number of spin states, resulting 
in a configuration of 3 communities. We see that the bi- 
partitioning approach would have cut through one of the 
communities in the network. Recursive bi-partitionings 
cannot generally lead to an optimal assigment of spins 
that maximizes the modularity. 

In ^Ij Newman has introduced a fast greedy strategy 
for modularity maximization. It effectively corresponds 
to a simple nearest neighbor agglomerative clustering of 
the network where the adhesion coefficient a,.s is used as 
similarity measure between. Newman's algorithms ini- 
tially assigns different spin states to every node and then 



FIG. 2: Example network for which an agglomerative ap- 
proach of grouping together nodes of maximal adhesion will 
fail. Starting from an assignment of different spin states to 
every node, the largest adhesion is found for the nodes con- 
nected by edge x and the nodes connected by x are grouped 
together first by the agglomerative procedure. However, it is 
clearly seen, that x should lie between different groups. 



proceeds by grouping those nodes together that have the 
highest coefficient of adhesion. As Figure |2] shows, this 
approach fails, if the links between two communities con- 
nect nodes of low degree. The network consists of 14 
nodes and 37 links. Is is clearly seen that in the ground 
state, the network consists of two communities and edge 
X lies between them. However, when initially assigning 
different spin states to all nodes, the adhesion between 
the nodes connected by x is largest: a — 1 — 16/2A/, 
since the product of degrees at this edge is lowest. There- 
fore, the agglomerative procedure described is misled into 
grouping together the nodes connected by x already in 
the very first step. Furthermore, it is clear that in a net- 
work, where all nodes have the same degree initially, all 
edges connect nodes of the same coefficient of adhesion. 
In this case, it cannot be decided, which nodes to group 
together in the first step of the algorithm at all. It was 
shown by Newman, that the approach does deliver good 
results in benchmarks using computer generated test net- 
works. The success of this approach depends of course on 
whether or not the misleading situations have a strong 
effect on the final outcome of the clustering. In the ex- 
ample shown, after grouping together the nodes at the 
end points of x, the algorithm will then proceed to fur- 
ther adding nodes from only one of the two communities 
linked by x. Hence, the initial mistake persists, but does 
not completely destroy the result of the clustering. 



COMPARISON WITH OTHER DEFINITIONS OF 
COMMUNITIES 

We have defined the term community as a set of nodes 
having properties 1.) through 3.). Compared with the 
many definitions of community in the sociological liter- 
ature [i^l, this definition is most similar to that of an 
"LS-set". An LS-set is a set of nodes 5* in a networks, 
such that each of its proper subsets has more links to its 
complement in S than to the rest of the network [23| . 
Previously, Radicchi et al. had given a definition of 
community "in a strong sense" as a set of nodes V with 
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the condition fcj" > k°'^^ G V, i.e. every node in the 
group has more Hnks to other members of the group than 
to the rest of the network. In the same manner, they de- 
fine a community in a "weak sense" , as a set of nodes V 
for which X^iey > Siey «-e- the total number 
of internal links is larger than half of the number of the 
external links, since the sum of A:™ is twice the number 
of internal edges. The similarity with properties 1.) and 
2.) of our definition is evident, but instead of comparing 
absolute numbers, our definition compares absolute num- 
bers to expectation values for these quantities in form of 
the coefiicients of cohesion and adhesion. One of the 
consequences of Radicchi et al. 's definitions is that ev- 
ery union of two communities is also a community. This 
leads to the strange situation that a community in the 
"strong" or "weak" sense can also be an ensemble of dis- 
joint groups of nodes. This paradox may only be resolved 
if one assumes a priori that there exists a hierarchy of 
communities. The following considerations and examples 
will show that hierarchies in community structures are 
possible, but cannot be taken for granted. The represen- 
tation of community structures by dendograms, there- 
fore, cannot always capture the true community struc- 
ture. Another definition of communities that implies a 
hierarchy is that given by Palla et al. There, a commu- 
nity is interpreted as a set of nodes that can be reached 
through a clique percolation process. This definition is 
very strict and focuses more on local structural proper- 
ties of the graph, whereas the other definitions, including 
ours, have a link density based interpretation which also 
makes them more robust to in the case of "noisy" data 
sets. 



OVERLAP AND STABILITY OF COMMUNITY 
ASSIGNMENTS 

One cannot generally assume that a community structure 
of a network is uniquely defined. There may exist several 
but very different partitions that all have a comparably 
high value of modularity. Palla et al. [i^ have introduced 
an algorithm to detect overlapping communities by clique 
percolation and Gfeller et al. have introduced the notion 
of nodes lying "between clusters" (23 . In the framework 
of this article, the overlap of communities is linked to 
the degeneracy of the minima of the Hamiltonian. This 
degeneracy can arise in several ways and we have to dif- 
ferentiate between two different types of overlap: overlap 
of community structure and overlap of communities. 

We have already seen that it is undecidable whether 
a group of nodes rit should be member of community rig 
or rir, if the coefficients of adhesion are equal for both 
of these communities. Formally, we find at^s\t = o-tr- In 
this situation, we speak of overlapping communities rig 
and TLr with overlap rit , since the number of communities 
in the network is not affected by this type of degeneracy. 



Nodes that do not form part of overlaps will always be 
grouped together and can be seen as the non-overlapping 
cores of communities. An example of this can be found 
in Figure 13 a.), where communities A and B overlap in 
node X. The ground state at 7 = 1 is twofold degenerate 
with node x belonging either to A or B. 

On the other hand, it may be undecidable, if two 
groups of nodes should be grouped together or apart, if 
the coefficient of adhesion between them is zero, i.e. there 
exist as many edges between them as expected from the 
model pij. Similarly, it may be undecidable, if a group of 
nodes should form its own community or be divided and 
the parts joined with different communities, if this can be 
done without increasing the energy. In these situations, 
the number of communities in the ground state is not well 
defined and we cannot speak of overlapping communities, 
since communities do not share nodes in the degenerate 
realisations. We will hence refer to such a situation as 
overlapping community structures. An example of this 
can be found in Figure 13 d.), where the three nodes in 
groups A and B form either one community as in a.) or 
two distinct communities of 2 and 1 node each. In gen- 
eral, however, both types of overlap may be present in a 
network. 

Since the coefficients of adhesion and cohesion depend 
on the value of 7 chosen, one can assess the stability of 
community structures under the change of this parame- 
ter. The network shown in FigureOillustrates the change 
of the ground state configuration with 7. 

We have already stressed, that properties 1 through 3 
are also valid for any local minimum of the energy land- 
scape defined by the Hamiltonian and the graph. They 
only imply that one cannot jump over energy barriers 
and move into deeper minima using the suggested move 
set. It may therefore be interesting to study also the lo- 
cal minima and compare them to the ground state. Local 
minima may be sampled by running greedy optimization 
algorithms using random initial conditions. This allows 
for a probabilistic interpretation of the community struc- 
ture induced by the minima of the Hamiltonian. For cor- 
related energy landscapes, it is known that deeper local 
minima have larger basins of attraction in the configu- 
ration space. The Hamiltonian (j^J induces such a cor- 
related energy landscape on the graph, since the total 
energy is not drastically affected by single spin changes. 
We therefore expect that the deep local minima will be 
sampled with higher frequency and that pairs of nodes 
that are grouped together in deep minima will have larger 
entries in a co-appearance matrix Cij that keeps track of 
how frequently node i and j have been grouped together 
in a local minima for multiple runs of a minimization 
routine. A number of examples of co-apperance matri- 
ces sampling local energy minima at different values of 7 
have been given in . 

Here, we shall instead investigate the possible hierar- 
chies of the community structures directly from the ad- 
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H=-7+65/16t 

H=-6+51/16y 

H=-5+45/16y 

H=-4+39/16y 

H=-3+33/16y 
H=-2+27/16y 



FIG. 3: For different values of 7, different spin configurations 
minimize the energy to form ground states. For 7 < 16/63, 
the ground state is ferromagnetic. For 16/63 < 7 < 8/7, 
the two-fold degenerate configuration a.) is the ground state, 
with node x belonging either to community A or B. For 
8/7 < 7 < 8/3, configuration b.) shows the non-degenerate 
ground state. For 7 = 8/3, configurations b.), c), d.), e.) 
and f.) all form ground states, but only f.) is ground state 
for 8/3 < 7 < 4. 



jacency matrix. The ordering of the rows and columns 
corresponding to nodes of the network is such that be- 
tween any two nodes that are assigned the same spin 
state, there never lies a node of different spin. The in- 
ternal order among the nodes of the same spin state is 
random. The choice of the ordering of the communities 
is arbitrary, but some orderings may be more intuitive 
than others. The link density in the adjacency matrix 
is directly transformed into grey levels. Since the inner 
link density of a community is higher than the exter- 
nal, we can distinguish communities as square blocks of 
darker grey. Different orderings may be combined into a 
consensus ordering. That is, starting from a super order- 
ing given, we reorder the nodes within each community 
according to a second given sub-ordering, i.e. we only 
change the internal order of the nodes within communi- 
ties of the super-ordering. 

First, we give an example of a completely hierarchical 
network. By hierarchical, we mean, that all communities 
found at a value of 72 > 71 are proper sub-communities 
of the communities found at 71. In our example, we 
have constructed a network made of four large commu- 
nities of 128 nodes each. Each of these nodes have an 
average of 7.5 links to the 127 other members of their 



community and 5 links to the remaining 384 nodes in the 
network. Each of these four communities is composed 
of four sub-communities of 32 nodes each. Each node 
has an additional 10 links to the 31 other nodes in its 
sub community. Figure ^ shows the adjacency matrix 
of this network in different orderings. At 7 = 1, the 
ground state is composed of the four large communities 
as shown in the left part of Figure 01 Increasing 7 above 
a certain threshold makes assigning different spin states 
to the 16 sub communities the ground state configura- 
tion. The middle part of Figure 0] shows an ordering 
obtained with a value of 7 = 2.2. We can see, that some 
of the these sub-communities are more densely connected 
among each other. Imposing the latter ordering on top of 
the ordering obtained at 7 = 1 then allows to display the 
full community structure and hierarchy of the network as 
shown in the right part of Figure 0] Note that we have 
not used a recursive approach applying the community 
detection algorithm to separate subgroups. Instead, we 
have obtained two independent orderings which are only 
compatible with each other, because the network has a 
hierarchical structure of dense communities composed of 
denser sub-communities. 

In contrast to this situation. Figure [S] shows an exam- 
ple of a network that is only partially hierarchical. The 
network consists of 2 large communities A and B contain- 
ing 512 nodes, which have on average 12 internal links per 
node. Within A and B, a sub-group of 128 nodes exists, 
which we denote by a and b, respectively. Every node 
within this sub-group has 6 of its 12 intra-community 
links with the 127 other members of this sub-group. The 
two sub-groups a and b have on average 3 links per node 
with each other. Additionally, every node has two links 
with randomly chosen nodes from the network. From 
Figure [SJ we see that we find the two large communi- 
ties using 7 — 0.5. Maximum modularity, however, is 
reached at 7 = 1 when a and b are joined into a sepa- 
rate community. Only when using the consensus of the 
ordering obtained at 7 = 0.5 and 7 = 1, we can under- 
stand the full community structure with a and 6 being 
subgroups that are responsible for the majority of links 
between A and B. It is understood, that this situation 
cannot be interpreted as a hierarchy, even though a and 
b are cohesive subgroups in A and B, respectively. We 
shall now turn to a real world example to see, whether 
these structural properties can indeed be found outside 
of artificially constructed examples. 

As a real world example, we study the co-authorship 
network of the Los Alamos condensed matter 

preprint archive, considering articles published between 
April 1998 and February 2004. This network has also 
been analyzed by Palla et al. in [2^1 . Every article induces 
a complete subgraph between the authors in this network. 
Since articles with a large number of authors induce very 
large cliques, every link induced by a single paper of n au- 
thors is only given a weight of l/(n — 1). After summing 
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FIG. 4: Example of an adjacency matrix for a perfectly hierarchical network. The network consists of four communities, 
each of which is composed of four sub-communities. Using 7 = 1, we find the four main communities (left). With 7 = 2.2, 
we find the 16 sub communities (middle). Link density variations in the off diagonal parts of the adjacency matrix already 
hint at a hierarchy. The consensus ordering (right) shows, that each of the larger communities is indeed composed of four 
sub-communities each. 



FIG. 5: Example of an adjacency matrix for an only partially hierarchical network with overlapping community structure. 
The network consists of two large communities A and B, each of which contains a sub-community a and b, which are densely 
linked with each other. Using 7 = 0.5, we find the two large communities (left). With a larger 7 = 1, we find the two small 
sub-communities a and b grouped together. The consensus ordering (right) shows, that most of the links, that join A and B in 
fact lie between a and b. 



the weights for all papers, only links with a weight of 0.1 
and greater were kept, transforming the network into a 
non-weighted one. The network consists of 30, 561 nodes 
connected by 125, 959 links. There are 668 connected 
components, the largest of which has 28, 502 nodes and 
123, 604 links. We only work with the largest connected 
component. The average degree is (fc) = 8.7. We then 
minimize the Hamiltonian Q using pij = kikj/2M and 
q — 500. Three different values of 7 were used. For each 
of the values of 7, some of the 500 spin states remained 
unpopulated, which makes us confident that we provided 
enough spin states. Figure IHl shows the adjacency ma- 
trix of the co-authorship network with rows and columns 
ordered according to the ground state at 7 = 0.5. We 
can distinguish 3 major communities along the diago- 
nal of the matrix and a large number of smaller com- 



munities. Off-diagonal entries in the matrix show, where 
communities are connected with each other. Figure [7| 
shows the same adjacency matrix, but ordered according 
to the ground state obtained at 7 — 1, while Figure |S| 
was obtained ordering the adjacendy matrix according 
to the ground state obtained at 7 = 2. We see, how 
the increase of 7 leads to to a higher number of smaller 
communities and a reduction in size of the major commu- 
nities as expected. In Figure 1^1 we show the adjacency 
matrix in a consensus ordering of the three single order- 
ings. If the network was hierarchical with respect to 7, 
i.e. the communities found for larger values of 7 are all 
complete sub-communities of those found at smaller 7, 
we should be able to distinguish this from the adjacency 
matrix in the same manner as in Figure ^ From the 
consensus ordering, we can see that community A from 
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FIG. 6: Adjacency matrix of the co-author network ordered FIG. 7: Adjacency matrix of the co-author network ordered 
according to the ground state with 7 = 0.5. according to the ground state with 7 = 1. 
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FIG. 8: Adjacency matrix of the co-author network ordered FIG. 9; Adjacency matrix of the co-author network ordered 



according to the ground state with 7 = 2. 



first according to the ground state with 7 = 0.5. Within the 
clusters, the nodes were then ordered again according to the 
ground state with 7 = 1 and within these clusters, the nodes 
were ordered according to the ground state with 7 = 2. 



the 7 = 0.5 ordering is composed of a number of smaller 
communities in a somewhat hierarchical manner, while 
community B seems to consist of a dense core and many 
adjacent nodes that are gradually removed as 7 increases. 
Community C again is decomposed into several smaller 



subgroups by the consensus ordering that seems to show 
two levels of hierarchy. The interpretation of the com- 
munity structure and its hierarchy in terms of research 
fields is beyond the scope of this article and shall not be 
attempted here. Rather, we intend to show that both 
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hierarchical and overlapping community structure exists 
in the link patterns of real world networks and how it 
can be uncovered. 



MINIMIZING THE HAMILTONIAN 

After having studied some properties of the ground 
state, we now turn to the problem of actually finding it. 
Though any optimization scheme that can deal with com- 
binatorial optimization problems may be implemented 
[isl Egf . we show the use of Simulated Annealing [s^l 
for this Potts-model, because it yields high quality re- 
sults, is very general in its application and very simple 
to program. The single spin heat bath update rule at 
temperature T — 1/(3 is as follows: 



p{ai = a) 



exp {-(iliiiai^uai = a})) 
ELi exp {-Pn{{(J,^i , cr; = s})) ' 



(22) 



That is, the probability of spin I being in state a is pro- 
portional to the exponential of the energy of the entire 
system with all other spins i ^ I fixed and spin / in state 
a. Since this is costly to evaluate, we pretend that we 
know the energy of the system with spin I in some ar- 
bitrarily chosen spin state (/>, which we denote by TL^. 
Then we can calculate the energy of the system with / in 
state a as Ti.^ ~\- AH{ai — cf) ^ a). The energy Ti.^ then 
factors out in H22|l and we are left with: 



exp {-l3Mi{(Ti ^(f)^a)} 



(23) 



The change in energy M-L{ai = 4> ^ a,4> ^ a) \s easily 
calculated for both the models of pij . For the simpler of 
the two with pij = p, we find: 



^n{ai = a, ^ a) = ^(A,, - 7p)<5(0, cjj) - Y,{Ai, - jp)S{a, a,) (24) 

= ^Aij5{(j),aj)--ip{n^-l)-^Aij5{a,aj)+-ipna (25) 

= a;0 - aia- (26) 

Here, and Ua are the number of nodes in spin state (f> and a respectively, i.e. the size of groups (j) and a. For the 
model with pij — kikj/2M we find the following update rule: 

An{m^<P^a,cb^a) = Y,iAi,-^^)S{<P,a,)~J2iA,~j^)6{a,a,) (27) 

= Ai,5{4>. a,) - 1^{K^ - - E ^^■) + (28) 

= ai^ - aia- (29) 



Here, again, Kc/, and Ka denote the sum of degrees of 
nodes in states and a, respectively. In both cases, 
comparing the adhesion of spin ai with its present com- 
munity rirj, and all other communities na the spin state 
for which the adhesion is largest is assigned the largest 
probability. Only local information about the states of 
the neighbors of a node and some global bookkeeping is 
necessary. This makes the implementation of a simulated 
annealing or any other optimization algorithm especially 
simple and efficient, even though we are dealing with an 
infinite range spin glass which has non-zero couplings be- 
tween all pairs of nodes. 



FINDING THE COMMUNITY AROUND A 
GIVEN NODE 



Often, it is desirable not to find all communities in a 
network, but to find only the community to which a par- 
ticular node belongs. This may be especially useful if the 
network is very large and detecting all communities may 
be time consuming |3ll| . In the framework presented in 
this article, we can do this using a fast, greedy algorithm. 
Starting from the node j we are interested in, we succes- 
sively add nodes with positive adhesion to the group, as 
long as the adhesion of the community we are forming 
and the rest of the network decreases. Adding a node i 
from the rest of the network r to the community s around 
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the start node, the adhesion between s and r changes by: 
Aasr{i ^ s) = air - ais. (30) 
For pij = p, this can be written as 

Aasr{i ^ s) ^ hr - hs - 7P("-r- - 1 - ^T-s), (31) 

where Ur = N — Us is the number of nodes in the rest of 
the network, and Us the number of nodes in the commu- 
nity. For Pij = kikj/2M, the change in adhesion reads: 

Adsrii ^ s) = kig 2A/^^ (-^^ ^s) • (32) 

Here, Kr and Kg are the sums of degrees of the rest 
of the network and the community under study, respec- 
tively, and ki is the degree of node i to be moved from r 
to s, which has kis hnks connecting it with s and kir hnks 
connecting it with the rest of the network. It is under- 
stood that only when the adhesion of i with s is larger 
than with r, the total adhesion of s with r decreases. 
Equivalent expressions can be found for removing a node 
i from the community s and rejoining it with r. For 
7 = 1 and pij = kikj/2M, we have Uis + air + ^.cn — 0, 
and Cii < by definition and close to zero for all practi- 
cal cases. Then, a^s and either both positive and 
very small or have opposite sign. Choosing the node that 
gives the smallest Aa^s will then result in adding a node 
with positive coefficient of adhesion to s. It is easy to 
see, that this ensures a positive coefficient of cohesion in 
the set of nodes around j. 

In order to benchmark the performance of this ap- 
proach, we applied it again to computer generated test 
networks as done for the algorithm on the entire network 
in 0. We used networks of 128 nodes, which are grouped 
into four equal sized communities of size 32. Each nodes 
has an average degree of (fc) = 16. The average number 
of links to members of the same community (kin) and to 
members of different communities (kout) is then varied, 
but always ensuring {kin) + {kout) = {k). Hence, decreas- 
ing kin renders the problem of community detection more 
difficult. Starting from a particular node, we are inter- 
ested in the performance of the algorithm in discovering 
the community around it. We measure the percentage 
of nodes that are correctly identified as belonging to the 
community around the start node as sensitivity and the 
percentage of nodes that are correctly identified as not 
belonging to the community as specificity. 

Figure El shows the results obtained for different val- 
ues of {kin) at 7 = 1 and using pij — kikj/2M as model 
of the connection probability. We note, that this ap- 
proach performs rather well for a large range of {kin) 
with good sensitivity and specificity. In contrast to the 
benchmarks for running the simulated annealing on the 
entire network as shown in 0, we obtain a sensitivity 
that is generally larger than the specificity. This shows, 
that running the simulated annealing on the entire net- 
work tends to mistakenly group things apart, that do not 
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FIG. 10: Benchmark of the algorithm for discovering the com- 
munity around a given node in networks with known com- 
munity structure. We used networks of 128 nodes and four 
communities. The average degree of the nodes was fixed to 
16, while the average number of intra-community hnks {kin) 
was varied. Sensitivity measures the fraction of nodes cor- 
rectly assigned to the community around the start node, while 
specificity measures the fraction of nodes correctly kept out 
of the community around the start node. 

belong apart by design, while constructing the commu- 
nity around a given node, tends to group things together 
that do not belong together by design. This behavior 
is understandable, since working on the entire network 
amounts to effectively implementing a divisive method, 
while starting from a single node means implementing an 
agglomerative method. 



EXPECTATION VALUES FOR THE 
MODULARITY 

In order to assess the statistical significance of the mod- 
ularities found with any algorithm, it is necessary, to 
compare them with expectation values for random net- 
works. This is of course always possible by rewiring the 
network randomly j^, keeping the degree distribution 
invariant and then running a community detection al- 
gorithm again, comparing the result to the original net- 
work. This method, however, can only give an answer to 
what a particular community detection algorithm may 
find in a random network and hence depends on the very 
method of community detection used. Much better seems 
a method to compare the results of a community detec- 
tion algorithm with a theoretical result, obtained inde- 
pendently of any algorithm. We have already seen, that 
the problem of community detection can be mapped onto 
finding the ground state of an infinite range spin glass. In 
the limit of large N, the local field distribution of infinite 
range spin glasses is Gaussian and can hence be char- 
acterized by only the first two moments of the coupling 
distribution, the mean and the variance. The couplings 
used in the study of modularity are = Aij — ^pij 
which have a mean independent of the particular form of 

Pv- 

Jo = (1 - 7)P (33) 
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which is zero in the case of the "natural partition" at 
7 = 1. The variance amounts to: 



j2=p-(27-7')(p')- 



(34) 



Since the mean of the couphng distribution couples to 
the magnetization of the ground state, all coupling dis- 
tributions with zero mean will have zero magnetization 
in the ground state. Hence, for a random graph we 
expect maximum modularity for an equi-partition. A 
number of well known results exist in the literature for 
equi-partitions. Fu and Anderson |33| have given results 
for bi-partitionings and Kanter and Sompolinsky for q- 
partitionings js^l- With these, we can write immediately 
for the modularity at 7 = 1: 



1 ^(^) 



(35) 



where U{q) is the ground state energy of a q-state Potts 
model with Gausssian couplings of zero mean and vari- 
ance J^. For large q, we can approximate U{q) — ^/qlriq. 
In Table J] we give some small values of q obtained by 
using the exact formula for calculating U{q) from |3j|. 
We see, that maximum modularity is obtained at g = 5, 



q 


2 


3 


4 


5 


6 


7 


8 


9 


U{q)/q 


0.384 


0.464 


0.484 


0.485 


0.479 


0.471 


0.461 


0.452 



TABLE I: Values of U{q)/q for various values of q obtained 
from m, which can be used to approximate the expected 
modularity with equation 13511 . 

though the value of U{q)/q for g = 4 is not much differ- 
ent from it. This qualitative behavior, that dense random 
graphs tend to cluster into only a few large communities 
is confirmed by our numerical experiments. By rewrit- 
ing M = pN"^ /2 and under the assumption of = p 
as in the case of Erdos Renyi (ER) random graphs [35j , 
we can further simplify equation H35|l and write for the 
maximum value of the modularity of a ER random graph 
with connection probability p and N nodes: 



Q = 0.97 



l_p_ 

pN 



(36) 



where we have already made use of the fact, that q ~ 5 
makes the modularity maximal. Figure 1111 shows the 
comparison of equation l|36|) and experiments where we 
have numerically maximized the modularity using a sim- 
ulated annealing approach as described in an earlier sec- 
tion. We see, that the prediction fits the data well for 
dense graphs and that modularity decays as a function 
of (piV)-i/2 instead of {2/pNf/^ as proposed in |2oj . 

While the value of Q for random graphs from the Potts 
spin glass is rather close to the actual situation for sparse 
random graphs, the number of communities, at which 
maximum modularity is achieved is not. In l20l|, it had 
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FIG. 11: Modularity of Erdos Renyi random graphs with av- 
erage connectivity pN — (k) compared with the estimation 
from equation 13611 . For the experiment, random graphs with 
iV = 10000 were used. 



already been shown, that the number of communities for 
which the modularity reaches a maximum is V7V for tree- 
like networks with (k) = 2. Unfortunately, no plot was 
given for the number of communities found in denser net- 
works. Our numerical experiments on large Erdos Renyi 
random graphs also show, that the number of commu- 
nities found in sparse networks tends to increase as (k) 
decreases. 

Even though we have seen that in general, recursive 
bi-partitioning will not lead to an optimal community 
assignment, we shall still use this approach for random 
graphs. Maximum modularity for random graphs is 
achieved for equipartitions. Partitioning the network re- 
cursively until no further improvement of Q is possible 
allows us to find the number of communities in a ran- 
dom graph. The number of cut edges C = C{N, M) in 
any partition, will be a function of the number of nodes in 
the remaining part and the number of connections within 
this remaining part and their distribution. We note, that 
the M connections will be distributed into internal and 
external links per node kin + fco«t — k. This allows us 
to write C = iV(fco„t)/2 for a bi-partition. After each 
partition, the number of internal connections a node has 
decreases due to the cut. We use these results in order 
to approximate the number of cut edges after b recursive 
bi-partitions which lead to 2^ parts: 



C — ^2* ^-;^{kout,t) — — (fcpMt, 



N 



(37) 



t=i 



where {kout,t) is the average number of external edges 
a node gains after cut t. Since for an Ising-model, the 
ground state energy is —Egs — M — 2C we find: 



(fc) 
2 



Ecsm) = (k) - (kout) 



(38) 
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This shows, that for any bi-partition, we can, on aver- 
age, always satisfy more than half of the links of every 
node on average. This means also, that any bi-partition 
will satisfy the definition of community given by Radicci 
[2^ at least on average, which further means, that every 
random graph has - at least on average - a community 
structure, assuming Radicci's definition of community in 
a strong sense {kin > kout) for every node of the ran- 
dom graph. The definition of community in a weak sense 
Si ^i" > Si ^i""* can always be fulfilled in a random 
graph. 

From (|38() we can then calculate the total number of 
edges cut after t recursions according to 1)3 7() using results 
of Fu and Anderson [3^ again who find for a bi-partition: 



C = 



M 



1 - c 



1-p 



pN 



(39) 



with a constant of c = 1.5266 ± 0.0002. We can write 



pN + c/pNjl-p) 
{kin) = ^ =pN - {kout) (40) 

from which we can calculate (|37|l substituting pN with 
the appropriate {kin) in every step of the recursion. The 
modularity can then be written: 



2"- 1 



1 



(41) 



Now we only need to find the number of recursions b 
that maximizes Q. Since the optimal number of recur- 
sions will depend on pN, we also find an estimation of 
the number of communities in the network. Figure IT?! 
shows a comparison between the theoretical prediction 
of the maximum modularity that can be obtained from 
equation H41|l . The improvement of (|41(l over (|36() must 
be due to the possibility of having larger numbers of com- 
munities, since (|39|) also assumes a Gaussian distribution 
of local fields, which is a rather poor approximation for 
the sparse graphs under study. Again, we find that the 
modularity behaves asymptotically like k~^^^ as already 
predicted from the Potts spin glass and contrary to the 
estimation in '2Qj . 

Figure [T^ shows the comparison of the number of com- 
munities estimated from (|41|l and the numerical experi- 
ments on random graphs. The good agreement between 
experiment and prediction is interesting, given the fact, 
that H41|l allows only powers of two as the number of 
communities. For dense graphs, the Potts limit of only a 
few communities is recovered. We see, that sparse ran- 
dom graphs cluster into a large number of communities, 
while dense random graphs cluster into only a hand full 
of large communities. Most importantly, sparse random 
graphs exhibit very large values of modularity. These 
large values are only due to their sparseness and not due 
to small size. We also stress that statistically significant 
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FIG. 12: Modularity of Erdos Renyi random graphs with av- 
erage connectivity pN — (k) compared with the estimation 
from equation 1411 For the experiment, random graphs with 
iV = 10000 were used. 
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FIG. 13: Number of communities found in Erdos Renyi ran- 
dom graphs with average connectivity pN = (k) compared 
with the estimation from equation 1411 For the experiment, 
random graphs with A'' = 10000 were used. 



modularity must exceed the expectation values of mod- 
ularity obtained from a suitable null model of the graph. 
If this null model is an Erdos Renyi random graph, then 
there is very little improvement possible over the val- 
ues of modularity obtained for the null model for sparse 
graphs. 



CONCLUSION 

In this article, we have tried to elucidate some of the 
general properties of the problem of community detec- 
tion in complex networks. We have shown, that it can be 
mapped onto finding the ground state of an infinite range 
Potts spin glass from a very simple and general one pa- 
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rameter ansatz, which is also vahd for weighted networks 
and directed networks. We could show that our ansatz 
leads to known modularity measures in a natural way. 
Wc have introduced the concept of cohesion and adhe- 
sion into the terminology of networks as a measure of the 
degree to which groups of nodes belong together or apart 
in a community structure. From the properties of the 
ground state as the minimal energy or maximally modu- 
lar configuration, we could deduce a number of properties 
that define a community. By studying the ground state 
structure and its changes under parameter variation, we 
could also show, how hierarchical and overlapping com- 
munity structures manifest themselves. Comparisons of 
our with other definitions of communities were given. We 
have provided efficient update rules for single spin heat 
bath simulated annealing algorithms that allow to opti- 
mize the spin configuration of an infinite range system 
by using solely sparse local information and some global 
bookkeeping. We have extended the algorithm of find- 
ing the entire community structure of the whole network 
to finding only the community around a given node and 
we have given benchmarks for the performance of this 
extension. Finally, we have summarized known results 
from the theory of infinite range spin glasses in order to 
shed some light on the problem of community detection 
in Erdos Renyi random graphs. We have seen, that sparse 
ER random graphs may show very large modularities and 
that the expected modularity of an ER random graph de- 
cays as y/l/{k) independent of the size of the graph. Fur- 
ther, we have seen, that sparse ER random graphs tend 
to cluster into many small communities, while for dense 
random graphs, maximum modularity is achieved for a 
very small number of communities only, which is inde- 
pendent of the average degree of the network. We stress 
the importance of comparing the values of modularity 
found in real world networks with expectation values of 
appropriate null models in order to assess their statisti- 
cal significance. Only graphs which lead to modularities 
larger than the expectation value should be called mod- 
ular. In this respect, it is understood that Erdos Renyi 
random graphs contain communities, but this alone does 
not make these graphs modular. 
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